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FIELD OF THE INVENTION 



20 



The invention relates to bacterial RT enzymes which are capable of synthesizing a 
hybrid RNA-DNA molecule, called msDNA together with the genes which synthesize the DNA and 
RNA portion of the molecule. 

Another aspect of the invention relates to the isolation and purification of RTs from 
bacterium which is capable of synthesizing msDNA. The invention deals with groups of prokaryotes 
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bacterium capable of synthesizing msDNAs is identified by testing positive by an appropriate 





screening test. 



This is the first time that, as taught in the subject parent patent applications, reverse 
transcriptase has been found and isolated from a prokaryote. 
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BACKGROUND OF THE INVENTION 



Previously, there was described a chromosomal region of the bacterium Myxococcus 
xanthus v^hich coded for the RNA and DNA portions of an msDNA. Dhundale et al. (Dhundale '87) 
"Structure of msDNA from Myxococcus xanthus : Evidence for a Long, Self-Annealing RNA 
precursor for the Covalently Linked, Branched RNA", CeU, Vol. 51, pages 1105-1112 (December 24, 



1987). Dhundale etal. speculated that an Alu I nucleotide fragment contained all the essential coding 



regions to produce an msDNA, This speculation turned out to be in error. 



The Alu I fragment of Dhundale etal., in fact, and inherently did not contain the gene 



sequence coding for an RT. The Alu I fragment was too short to code for the gene sequence coding 
for an RT. This was proven by way of sequence analysis by a computer program which searches for 



clearly shows that there is no translational reading frame in the Dhundale et aL fragment open across 
a stretch of DNA sufficiently long enough to encode any reverse transcriptase. 



What is reported in Dhundale et a], in 1987 with respect to a bacterial reverse 



transcriptase was totally contrary to accepted dogma at that time about the distribution of these 
enzymes, i.e., that they were present only in viruses which infect eukaryotic organisms. 

For the 20 years since the discovery of reverse transcriptase, it was believed that these 
enzymes were restricted to viruses which infect eukaryotic cells. Now, in accordance with the 
invention, reverse transcriptases have been identified in bacteria. 
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open reading frames that can potentially code for a protein. The print-out of the sequence analysis 
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In accordance with the invention, it is shown that various bacteria have nucleotide 
sequences named "retrons" which encode reverse transcriptases (RTs) which are capable of 
synthesizing msDNAs. The invention also relates to the isolated and purified bacterial RTs. It has 
5 also been determined that the RTs of the bacteria which synthesize msDNAs possess common 
conserved nucleotide sequences and amino acid residues. 

Representative members of the Enterobacteriaceae . Rhizobiaceae and 
Mycobacteriaceae families are demonstrated to be capable of synthesizing msDNA. These bacteria 
can be screened for the capability of synthesizing msDNA by an RT labeling or extension in vitro 
10 test. 

P; BRIEF DESCRIPTION OF THE DRAWINGS 
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w 
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Figure 1 shows the restriction map of the 3.4 kb fragment around msd and downstream 

of msr . 

Figure 2 shows the nucleotide sequence of the chromosomal region encompassing the 



^5 msDNA and msd RNA coding regions and an QRF region downstream of msr and the amino acid 
sequence of Mxl62-RT. 

Figure 3 shows the amino acid sequence alignment of the msDNA-Mxl62 ORF with 
a portion of the retroviral Pol sequences from HIV and HTLVl and the ORF of msDNA-Ec67. 

Figure 4 shows the sequence similarity of the msDNA-Mxl62 reverse transcriptase 
20 with other retroelements. 

Figure 5 shows the sequence comparison of the regions around the YXDD box of 
various reverse transcriptases. 

Figure 6 shows the detection of msDNA in a clinical isolate of E. coli. 
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Figure 7 shows the complete primary and proposed secondary structure of msDNA- 

Ec67. 

Figure 8 shows the determination of the RNA nucleotide sequence for the branched 
RNA linked to msDNA. 

5 Figure 9 shows the southern blot analysis of E, coli Cl-1 Chromosomal DNA(A) and 

analysis of msDNA synthesis by pCl-lE and pCl-lP(B). 

Figure 10 shows the restriction map of the 11.6 kb Eco RI fragment. 
Figure 11 shows the nucleotide sequence of the region from the E. coli Cl-1 
chromosome encompassing the msDNA, msd RNA and ORF coding regions and the amino acid 
10 sequence of Ec67 - RT. 

Figure 12 shows the amino acid sequence alignment of the E. coli msDNA ORF with 
33 a portion of the retroviral Pol sequence from HIV and HTLVl. 

i 

p* Figure 13 shows the detection of RT activity from various cell extracts, 

p Figure 14 shows the amino acid sequence alignment of bacterial RTs. 

p Figure 15 shows the nucleotide and amino acid sequence of Mx65-RT. 

p Figure 16 shows the nucleotide and amino acid sequence of Sal63-RT. 

^ Figure 17 shows the nucleotide and amino acid sequence of Ec73-RT. 

Figure 18 shows the nucleotide and amino acid sequence of Ec86-RT. 
'^"'^ Figure 19 shows the nucleotide and amino acid sequence of Ecl07-RT. 

20 Figure 20 shows the msDNAs from total RNA prepared from each bacterial strain 

were specifically labeled with ^^P by the RT extension method (12, 14). 

Figure 21 shows a collection of 63 rhizobial isolates screened for the presence of 

msDNA by the RT extension method. 
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Figure 1. Restriction Map of the 3.4- kb fragment Around msd and Downstream 

of msr . 

The locations and the orientation of msDNA and msdRNA are indicated by a small 
5 arrow and an open arrow, respectively. A large solid arrow represents an ORF and its orientation. 

The only two Alul sites (A and B) are shown and the DNA sequence between Alul (A) and Alul (B) 
was determined previously by Yee et aL (1984). 



Figure 2. Nucleotide Sequence of the Chromosomal Region Encompassing the 
msDNA and msdRNA Coding Regions and an ORF Region Downstream of msr . ^ 

f% - 

^ The upper strand beginning at the Alu I (A) site (see Figure 1) and ending just beyond 

the ORF is shown. Only a part of the complementary lower strand is shown from base-301 to -600. 

01 

The boxed region of the upper strand (332-408) and the boxed region of the lower strand (401-562) 
correspond to the sequences of msdRNA and msDNA respectively (Dhundale et al., 1987). The 
starting sites for DNA and RNA and the 5' to 3' orientations are indicated by open arrows. The 

o 

^5 msdRNA and msDNA regions overlap at their 3' ends by 8 bases. The circled G residue at position 
W 351 represents the branched rG of RNA linked to the 5' end of the DNA strand in msDNA. Long 

solid arrows labeled al and a2 represent inverted repeat sequences proposed to be important in the 
secondary structure of the primary RNA transcript involved in the synthesis of msDNA (Dhundale 
et al., 1987). The ORF begins with the initiation codon at base 640. Single letter designations are 
20 given for amino acids. The YXDD amino acid sequence highly conserved among known RT proteins 
is boxed. Numbers on the right hand column enumerate the nucleotide bases and numbers with a* 
enumerate amino acids. Small vertical arrows labeled Alu I and Smal locate the Alu I and Smal 
restriction cleavage sites, respectively. The DNA sequence was determined by the chain termination 
method (Sanger et al., 1977) using synthetic oligonucleotides as primer. 
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Figure 3. Amino acid Sequence Alignment o&the msDNA-Mxl^ORF with _3 "7, 

Portion of the Retroviral Pol Sequences from HIV and HTLVlj^nd the ORE of msDNA-Ec67^ 

Amino acid sequences are compared with matching residues assigned as follows: (o) 
amino acid residues shared by all four proteins; (o) amino acid residues shared by msDNA-Mxl62 
and msDNA-Ec67 RTs; (x) amino acid residues shared by msDNA-Mxl62 RT with HIV or HTLVl 
RTs. Amino acid sequences showed are from residue -177 to -439 for HIV RT (Ratner et al., 1985); 
residue- 15 to -277 for HTLVl RT (Seiki et al., 1983); residue-32 to -291 for Ec-67 RT (Lampson 
et al., 1989); and residue -170 to -435 for Mx-162 RT (this work). The YXDD consensus sequence 
is outlined with a box. 



10 Figure 4. Sequence Similarity of the msDNA- Mxl62 Reverse Transcriptase with 

S Other Retroelements. A. Sequence similarity of the region from residue- 18 to -128 of the msDNA- 



MxK2 RT (see Figure 2) with a carboxyl terminal region of integrase^of Moloney murine leukemia 
1^^^ virus (Mo-MLV) (residue -1070 to -1179; Shinnick et al., 1981). B. Comparison of the sequence from 
' '"^-^ residue-411 to -485 of the msDNA-Mxl62 RT (see Figured) with the sequence from residue-396 
to -461 of the gap protein of human immunodeficiency virus (HIV; Ratner ef al., 1985). 



y 

5 a 

Figure 5. Sequence Comparison of the Regions Around the YXDD Box of 

*. C 5 

~^ Various Reverse Transcriptases. 

The region from residue-304 to residue-371 of the msDNA-Mxl62 RT (see Figure 

2) is aligned with various RTs from different sources. The identical amino acid residues with the 
20 msDNA-Mxl62 RT are indicated by open circles. The YXDD sequences are boxed. The residue 

numbers for the amino terminal residues and for the carboxyl terminal residues are indicated by the 

. f^'^j^ If 

[•< /3- left and the right hand sides of the sequences, respectively. Mx- 162 RTTrom this work (Figure 2); 

fh Ec-67 RT from Lampson et al. (1989); Ec-86 RX'Trom Lim and Maas (1989); HIV RT from Ratner 
fl> uw<^1^8 etli- (1985); HTLVl RT from Seiki et al. (1983); Mo-MLV RT from ShinnicR et al- (1981); RSV 
aoso.FfffF«iHsr (Rqus sarcoma virus) RT from Dickson et al. (1982); BUV (bovine leukemia virus) RT from Rice 
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UrTl985); Mt. plasmid (Neurospora mitochondrial plasmid) RT from Nargang et d. (1984)^17. 



/V A. Drosophila retrotranspospn from Saigo et d. Cl984); Drosophila retrotransposon f rom Yuki 

(J, ^ ^1986)^Tal-3 plant f^iahidopsis thaliana ) retroU.ns^^n from ^t^and A^^-H198^an^ 
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trotransposon from Voytasand Ausubel ana a 

Ty912 yeast retrotransposon from Clare and Farabaugh (1985)^Small arrows in Copi^al-3 and 
Ty912 indicate positions of insertions of extra sequences of 18, 18 and 13 residues, respectively. B, 
Phylogenetic relationships among various RTs listed in A. The branching positions are arbitrarily 



illustrated. 



Figure 6. Detection of msDNA in a clinical isolate of E. coH. Total RNA, 
prepared (Maniatis et al., 1982) from a 5-ml culture, was added to 50 v\ of a reaction mixture 
10 containing: 50 mM Tris-HCl (pH8.3); 6 mM MgCl2; 40 mM KCl; 5 mM DTT; 1 pM dATP, dTTP, 
n' and dGTP; 0.04 dCTP; 0.2 [a-^^pj^cTP; and 10 units of AMV-RT (Boehringer Mannheim). 
The reaction mixture was incubated at 37°C for 30 min. followed by extraction with 50 )i\ phenol- 
chloroform (1:1) and ethanol precipitation. The samples were electrophoresed on a 4% acrylamide - 
8 M urea gel. Lanes: (S) molecular weight markers; Ms£l digest of pBR322 end-labeled with [a- 
^5 32p]dcTP and the Klenow fragment of DNA polymerase I, (1) E. coH K- 12 strain C600, (2) the same 
i as in lane 1 except the sample was treated with RNase A (5 ^g, 10 min at 37 °C) just prior to 

W electrophoresis, (3) clinical isolate CI- 1, (4) clinical isolate CI- 1 treated with RNase A. The clinical 

5 isolate was identified as Escherichia coH (The clinical E. coH strains were urinary tract isolates kindly 

provided by Dr. Melvin Weinstein from the microbiology laboratory, R.W. Johnson Hospital, New 
Brunswick, NJ. The clinical strain CI- 1 was identified using the API-20E identification system (API 
laboratory products) and gave a typical E. coH profile number of 5044552). 



Figure 7. The complete^imary and proposed secondary structure of msDNA- 
/2v Ec67 The DNA se JSce '^'afdeteml^'by the Maxam and Gilbert method (Maxam et d., 1980) 
^ usine 3'-end labeled msDNA. The.RNA sequence (msdRNA;'boxed region) was determined using 
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base-specific RNases as previously described (Dhundale et al., 1987). The 2',5' Branched linkage 
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between the 15th rG residue and the 5' end of the DNA strand was determined using the debranching 
enzyme from HeLa cells as described previously (Dhundale et al., 1987; Furuichi et al-, 1987; Ruskin 
et al., 1985; Arenas et al., 1987; the debranching enzyme was a gift from Jerard Hurwitz). The 
branched rG at position 15 is circled, and both RNA and DNA are numbered from their 5' ends. 

Figure 8. Determination of the RNA nucleotide sequence for the branched RNA 
linked to msDNA. Total RNA was prepared from the clinical strain CM and fractionated on a 5% 
acrylamide gel. msDNA containing full length RNA was eluted from the gel. This fraction was then 
labeled at the 5' end of the RNA^^ftltS^lV? ^^^AT^^iff kinase. The 5' end labeled 

RNA linked to msDNA was again purified on an 18% acrylamide - 8M urea sequencing gel. The 
labeled RNA was then sequenced using limited digestion with base -specific RNases as described 
previously (Dhundale et M-, 1987). Lanes: OH", partial alkaline hydrolysis ladder; (0.5 M sodium 
bicarbonate/carbonate pH9.2); -E, no enzyme treatment of the labeled RNA linked to msDNA; Tl, 
RNase Tl (lU/reaction, 55°, 15 min.); U2, RNase U2 (lU and 0.5U/reaction, 55°, 15 min.); PhyM, 
RNase PhyM (lU/reaction, 55°, 15 min); Be, RNase B. cerus (2U/reaction, 55°, 15 min.); CL3, RNase 
%4 CL3 (2U/reaction, 37°, 15 min.). The large gap in the sequence gel is due to msDNA linked at the 
W rG residue at position 15 by a 2',5' phosphodiester linkage (Furuichi et al., 1987). The RNA sequence 
W at the 3' -end region from the branched rG residue (the upper part of the gel) was determined from 
6% gel (data not shown). 



Figure 9. Southern blot analysis of E. coH Cl-1 chromosomal DNA(A) and analysis of 
20 msDNA synthesis by pLl-lE and pCl-lP(B). A: The chromosomal DNA was digested with EcoRI 
(lane 1), Hindlll (lane 2), BamHI (lane 3), PstI (lane 4), and Bglll (lane 5). For each lane, 3 pg of the 
DNA digest was applied to a 0.7% agarose gel. After electrophoresis the gel was blotted to a 
nitrocellulose filter, and hybridization analysis was carried out according to Southern (Southern, 1975) 
u.wo.n«s "Sing msDNA labeled by AM V- RT with [a-32p)dCTP as a probe. Numbers at the left represent the 
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molecular weights in kb. B: Total DNA prepared from each strain was treated with RNase A, 
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separated on a 5% acrylamide gel and stained with ethidium bromide. Lane S, pBR322 digested with 
Msp l used for molecular size markers; lane 1, DNA prepared from the host strain CL-83(recA-); lane 
2, CL-83 frecA ') transformed with plasmid pCl-lE (11.6 kb EcoRI fragment; see Figure 5); lane 3, 
with plasmid pCl-lP (2.8- kb Pstl(a)-Pstl(b) fragment; see Figure 5). An arrow indicates the position 
5 of msDNA. 




Figure 10. Restriction map of the 11.6-kb EcoRI fragment. In the CI- IE map, 
the left-hand half (EcoRI to Hindlll) was not mapped. In the C1-1EP5 map, the locations and the 
orientations of msDNA and msdRNA are indicated by a small arrow and an open arrow, respectively. 
A large solid arrow represents an ORF and its orientation. 



Q|0 Figure 11. Nucleotide sequence of the region from the E. coli Cl-1 chromosome 

encompassing the ms^^^d^nsdRNA coding regions and an ORF downstream of the msdRNA 
rl^?i1hT^e?tirt^p^^£md beginning at the Ball site (see Figure 5) and ending just beyond the 



ORF is shown. Only a part of the complementary lower strand is shown from base 241 to 420. The 
long boxed region of the upper strand (249-306) corresponds to the sequence of the branched RNA 
Ms (msdRNA; see Figure 7) portion of the msDNA molecule. The boxed region of the lower strand 
W corresponds to the sequence of the DNA portion of msDNA (see Figure 7). The starting site for DNA 

M and RNA and the 5' to 3' orientations are indicated by large open arrows. The msdRNA and msDNA 

regions overlap at their 3' ends by 7 bases. The circled G residue at position 263 represents the 
branched rG of RNA linked to the 5' end of the DNA strand in msDNA. Long solid arrows labeled 
20 al and a2 represent inverted repeat sequences proposed to be important in the secondary structure of 
the primary RNA transcript involved in the synthesis of msDNA (Dhundale et al., 1987). Note that 
the nucleotide at position 257 (U on the RNA transcript) and the nucleotide at position 373 (G on the 
RNA transcript) form a U-G pair in the stem between sequence al and a2. The proposed promoter 
elements (- 10 and -35 regions) for the primary RNA transcript are also boxed. The ORF begins with 

kVEISER & ASSOCIATES 

.^^S^sToa the initiation codon at base 418. Single letter designations are given for amino acids. The YXDD 
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amino acid sequence conserved among known RT proteins is boxed. Numbers on the right hand 
column enumerate the nucleotide bases and numbers with a* enumerate amino acids. Small vertical 
arrows labeled H and P locate the Hin dlll and PstI restriction cleavage sites, respectively. The DNA 
sequence was determined by the chain termination method (Sanger et al., 1977) using synthetic 
5 oligonucleotides as primers. 

a/ 

Figure 12. Amino acid sequence alignnj^nt of the E. coli msDNA^RF with a 
A ^ portion of the retroviral Pol sequence from HlV^d HTLVV^Amino acid sequences are Compared 
with matching residues assigned as follows: (+) amino acid common to msDNA and HIV RTs; (o) 
amino acid shared by msDNA and HTLVl RTs; and (o) amino acid shared by all three proteins. 
J.0 Arrows divide the protein sequences into three functional domains (Toh et a]., 1983; Gengetal., 1985; 

Varmus, 1985, Tanese et al., 1988): An amino terminal RT domain, a carboxy terminal RNase H 
region, and a central "tether" region. The specific amino acid residues for the RT, tether, and RNase 
Q H domains, for each protein are: HIV, 177-439, 440-600, 601-722 respectively; HTLVl, 15-277, 

^ 278-462, 463-592 respectively; msDNA ORE, 32-290, 291-465, 466-586 respectively. The YXDD 

^5 polymerase consensus sequence is outlined with a box. 



Figure 13. Detection of RT activity from various cell extracts. Crude cell extracts 
were prepared from E. coli strain C2110 (polA") (Tanese etal., 1985; Tanese etal., 1986. E. coH strain 
C2110 (polAl') was a gift from M. Roth and S. Goff) containing plasmid pCl-lEP5 encoding the 
msDNA-ORF (see Figure 10) as well as the vector plasmid (pUC9; Yanisch - Perron etal., 1985) alone. 
20 Extracts were also prepared from the E. coH strain PRTS7 - 1 (polA+) containing the cloned M - MuL V 
RT gene (Varmus et al., 1985; Tanese et al., 1977; Tanese et al., 1985; Tanese et al., 1986. Crude 
extracts were prepared essentially as described (Roth et ai-, 1985; Hizi et al., 1988). Crude extract 
equivalent to 15 jug total protein was added to a 50 }il reaction cocktail (50 mM tris-HCl pH7,8, 10 
lAwomcEs mM DTT, 60 mM NaCl, 0.05% NP-40, 10 mM MgC^, 0.5 }ig poly(rC)-oligo(dG), and 0,1 pM [a- 
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S^SfS^oTw ^^PjdGTP and incubated at 37''C for one hour. Five jul of the reaction mixture was then spotted onto 
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DEAE paper (DE81; Whatman Inc.). The paper was washed to remove unincorporated label (Tanese 
et al., 1985; Tanese etal., 1986) and then exposed to an X-ray film. In row (A) all reactions contain 
added template primer (poly rC-dG). Row (B) contains control reactions in which no template- 
primer is added. Columns contain the designated cell extracts: M-MuLV, cloned Moloney Murine 
Leukemia Virus RT gene; pGB2 (Churchward etal., 1984), vector plasmid in strain C2110; pCl- 1EP5, 
recombinant plasmid with the cloned msDNA gene. The large amount of background activity 
observed with the M-MuLV control extract is due to the activity of DNA Polymerase I since this 
extract is obtained from a PolA"*" strain (HBlOl). 

Figure \4 shows the amino acid sequence alignment of bacterial RT carried out 
^rding to Xiong and Eftkbush (1990). Amino acids highly conserved in eukaryotic RTs are shown 
at the top of the sequences, lihese amino acids include largely unvaried residues or chemically similar 
residues, (h) Hydrophobic residue; (p) small polar residues; (c) charged residue. Amino acids 
conserved in all seven bacterial RTs (identical residues plus functional conserved residues indicated 
by h for hydrophobic residues or Afor polar residues) are marked by solid dots at the bottom of the 




e n B 

15 sequences. The consensus sequence Shown at the bottom of the sequences is determined when five 



out of seven sequences contain an identical or a chemically similar residue (h, hydrophobic residue; 

U p, charged and polar residue). The subdVnains 1 to 7 are according to Xiong and Eickbush (1990), 

SI which are boxed and indicated by numbers\ The highly conserved YXDD sequences are also boxed. 

Numbers on the right indicate the amino atid positions from the amino terminus for each RT. 

. n4p Sources for the sequences are Sal63 (Hsu et al.\992b), Mxl62 (InouyeitaL1989), Mx65 (Inouye et ^ 
Jea . i\3js.4r9^ .TTTi ^Oa ^^3^^ \s:^.:3:t> toaj^ 'Bus^. '^a^ 

flc al 1990), Ec67 (Lampson et al- 19^b), Ec86 (LiA an^ Maas 1989), Ec73 (Sun et ai- 1991), and EclOV 

^ (Herzer et al- 1992)^ \ 

Figure 15 shows nucleotide sequence of the chromosomal region encompassing the 
^Ck^^ Mx65-msDNA and msdRNA coding regions and an ORF region downstream of msr. The sequence 

WEISEB ft ASSOCIATES 

aoso^^SiHST. covers from the Alu 1(A) site to 78 bp downstream of the ORF. The complementary strand is only 
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shown from bases 121-300. The boxed region of the upper strand (positions 143-191) and the boxed 
region of the lower strand (positions 186-250) correspond to the sequences of msdRNA and msDNA, 
respectively. The starting sites for DNA and RNA and the 5' to 3' orientation are indicated by open 
arrows. The msdRNA and msDNA regions overlap at their 3' ends by 6 bases. The circled G residue 
5 at position 206 represents the branched guanosine of RNA linked to the 5' end of the DNA strand in 
msDNA. Long solid arrows labeled al and a2 represent inverted repeat sequences proposed to be 
important in the secondary structure of the primary RNA transcript involved in the synthesis of 
msDNA. The ORF begins with the initiation codon at base 279. The YXDD amino acid sequence 
highly conserved among known RT proteins is boxed. Numbers on the right-hand column enumerate 

10 the nucleotide bases, and numbers with asterisks enumerate amino acids (single -letter code). The 
DNA sequence was determined by the chain -termination method using synthetic oligonucleotides as 

ffl primers. 

w 

Q- 

g Figure 16 shows nucleotide sequences of 3,060 bases encompassing msr, nasd, and the 

RT gene of S. ^^^/rh^^f^^^^mbthll to base 720 which contains msr and msd is 
^45 shown double stranded. The boxed regions of the upper strand (bases 440 to 540) and the lower 
% strand (bases 508 to 670) correspond to the sequences of msdRNA and msDNA, respectively. The 

W starting sites for msDNA and msdRNA are indicated by open arrows. The circled G at the position 

458 is the branched rG of msdRNA linked to the 5' end of msDNA. Long solid arrows labeled with 
al and a2 represent inverted repeated sequences proposed to form the secondary structure in the 
20 primary RNA transcript which serves to prime msDNA synthesis. Amino acids are indicated by 
single letters. The YXDD sequence highly conserved among known RTs is boxed. X* and sites 
are indicated by arrows. Numbers on the right-hand side and numbers with asterisks represent 
numbers for bases and amino acids, respectively. 

Figure 17 shows the sequences^nhsdRNA andmsDNA which are boxed and their 
^™ orientations are indicated by ope^!!^ws" Th^b^^^h^ residue at position 10425 is circled. The 
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inverted repeat sequences require for the biosynthesis of msDNA - Ec73 are shown by arrows labeled 
al and a2. Amino acid residues of Ec73-RT are shown by a single- letter code put at the center of 

each codon J ^ 



10 



Figure 18 shows the restriction map of the 3.5 kb insert of pDB808 and nucleotide 
sequence of chromosomal determinants of the msDNA-RNA compound of E, coH B. (A) Restriction 
map of the 3.5 kb insert of clone pDB808. The solid bar represents the region whose sequence is 
presented in (B). Transcription is from left to right. Restriction enzymes are: P, Pstl, ft./Hpal; B, 
BglU; X, Xhol. (B) Nucleotide sequences of the chromosomal detmninants^Only the strand 
corresponding to the transcript is shown. Nucleotides are numbered starting from the first base 
observed in the msdRNA. The mdsRNA coding region is overlined, and the msDNA coding region 
is underlined. The msDNA sequence is complementary to the sequence shown in this figure. Inverted 
repeats are indicated by double -dashed lines. The G at position 14 is the branched guanylate of 
% msdRNA in the msDNA-RNA compound. IR, 12 bp inverted repeat. 

f^^p^ Figure 19 shows sequence of the retron and flanking regions of EclOV^he sequences 

^5 corresponding to the K-12 genomic DNA are shown in lower case letters from bases 1-99 and 1400- 
W 1540. The msRNA and msDNA regions are boxed. Also indicated are the al-a2 conserved inverted 

repeats (indicated by arrows) and the branched G, which is circled. The RT consists of 319 amino 
acids and contains the YXDD sequence (boxed) which is highly conserved among known RTs. The 
transcription start site occurs at base 170; a possible terminator is indicated by head-to-head arrows 
following the RT coding region. Primer extension was utilized in order to determine the transcription 
start site. These sequence data will appear in the EMBL/GenBank/DDJB Nucleotide Sequence Data 
Libraries under the accession number X62583. 
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The description which follows describes msDNA and RT from Mvxococcus xanthus . 
This is a typical bacterium which belongs to a genus of bacteria, whose representative members 
possess an RT capable of synthesizing msDNA. 
5 The existence of a peculiar branched RNA- linked DNA molecule called msDNA 

(multicopy single -stranded) has been demonstrated in various myxobacteria, Gram-negative soil 
bacteria (Yee et al.. 1984; Dhundale et al., 1985; Furuichi et al-, 1987a,b; Dhundale et al., 1987; 
Dhundale et al., 1988b). msDNA (msDNA-Mxl62) from Mvxococcus xanthus consists of 162-base 
single stranded DNA, the 5' end of which is linked to the T position of the 20th rG residue of a 77- 
10 base RNA molecule (msdRNA) by a 2', 5'-phosphodiester linkage (Dhundale et al., 1987). It exists 
at a level of approximately 700 copies per genome. Stigmatella aurantiaca also possesses an msDNA 
(msDNA-Sal63) which is highly homologous to msDNA-Mxl62 (Furuichi etal., 1987b). In addition 
to msDNA-Mxl62, M. xanthus has another smaller species of msDNA (mrDNA or msDNA-Mx65), 
which has no primary sequence homology with msDNA-Mxl62 or msDNA-Sal63 (Dhundale et al., 
!l5 1988b). However, all msDNAs so far characterized share key structural features such as a branched 

y 

y rG residue, stem-and-loop structures in RNA and DNA molecules, and a DNA-RNA hybrid at the 

W 3' ends of DNA and RNA molecules. 

SJ Previously it was predicted that reverse transcriptase is required for msDNA 

biosynthesis on the basis of the finding that msdRNA is derived from a much longer precursor, which 
20 can form a very stable stem-and-loop structure (Dhundale etal., 1987). This precursor molecule was 
proposed to serve as a primer for initiating msDNA synthesis as well as a template to form the 
branched RNA- linked msDNA. The latter reaction requires reverse transcriptase activity. In M. 
xanthus . the region coding for the RNA molecule (msr) is located on the chromosome in the opposite 
orientation to the msDNA coding region (msd) with the V ends overlapping by 6 bases for msDNA- 
Mx65 (Dhundale etal., 1988b) or by 8 bases for msDNA-Mxl62 (Dhundale etal., 1987). In addition, 

^OOATES 
00 

^ITJ^oi as in all the msDNAs found in myxobacteria, there is an inverted repeat comprised of a 14-base 
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sequence for msDNA-Mx65 (Dhundale et al., 1988b) or a 34-base sequence for msDNA-Mxl62 
(Dhundale et al., 1987) and a 33 - base sequence for msDNA - Sal63 (Furuichi et al. , 1987b) immediately 
upstream of the branched G residue and a sequence immediately upstream of the msDNA coding 
region. As a result of this inverted repeat, a longer primary transcript beginning upstream of the 
5 RNA coding region and extending through the msDNA coding region is considered to self -anneal and 
form a stable secondary structure. When three base mismatches were introduced into the secondary 
structure immediately upstream of the branched rG residue, msDNA synthesis was almost completely 
blocked. However, if three additional base substitutions were made on the other strand to resume the 
complementary base pairing, msDNA production was restored (Hsu et al., 1989). This result strongly 
10 supports the proposed model for msDNA synthesis. 

It was also shown that a deletion mutation at the region 100 base pairs (bp) upstream 
of the DNA coding region (msd) and an insertion mutation at a site 500 bp upstream of msd caused 
a significant reduction in msDNA production (Dhundale et ai., 1988a). This indicates that there is 
P» a cis- or trans-acting positive element required for msDNA synthesis in this region. In this report 



Ws we determined the DNA sequence of this region and found an opening reading frame (ORF) coding 

3 

l_ for 485 amino acid residues beginning with an initiation codon, ATG, which is located 77 bp 

5 5 

W upstream of msd (or 231 bp downstream of msr). The very close proximity between msd and the ORF 

U suggests that they may be transcribed as a single transcript. The amino acid sequence of the ORF 

SJ shows similarity with retroviral reverse transcriptases. We discuss a possible origin of the reverse 

20 transcriptase gene as well as a possible relationship between the msDNA system and retroviruses. 

Recently, some strains of Escherichia coli were found to produce msDNA and the gene for reverse 
transcriptase which is essential for msDNA production, is linked to the msd region, (Lim and Maas, 
1989; Lampson et al., 1989b). Comparison of the msDNA systems of M. xanthus and E. coH raises 
an intriguing question as to how the extensive diversity found in msDNA systems has emerged in 
25 bacteria and what possible functions msDNA may have. 

In a preceding paper, it was demonstrated that msDNA is in fact synthesized by 

KHOffVCES *^ o r r ? 
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Reverse transcriptases are isolated, and if desired, purified, and biological 
characterization carried out, if desired, by known methods such as those described in Lampson, B.C., 
M. Viswanathan, M. Inouye and S. Inouye, "Reverse Transcriptase from Escherichia coli Exists as a 
Complex with msDNA and is Able to Synthesize Double-stranded DNA". J. Biol. Chem. 265: 8490- 
5 8496 (1990), which is incorporated by reference as if fully set forth herein. 




RESULTS AND DISCUSSION 



S E 5 



Identification of an QRF Associated with msd 

On the basis of mutations closely associated with msd which significantly reduce 
msDNA production, it was assumed that in this region there is a cis- or trans-acting element which 

o 

^ is essential for msDNA synthesis (Dhundale et al., 1988a). Figure 1 shows a restriction map around 
p msd . The msDNA coding region is shown by a thin arrow from right to left (msd ), and the msdRNA 

n coding region by a thick open arrow (msr). In the previous work (Dhundale et al., 1988a), two 

2 mutations were constructed; one, a deletion mutation in which the sequence from Alu 1(b) to Smal 

was replaced by a gene for kanamycin resistance (see Figure 1), and the other an insertion mutation 
at the Smal site by a gene for kanamycin resistance (see Figure 1). 
^ In order to elucidate the properties of the element required for msDNA production, 

the DNA sequence of the region upstream of msd was determined as shown in Figure 2. A long open 
reading frame (ORF) beginning with an initiation codon was found 77 bases upstream of msd. The 
QRF is preceded by a ribosome binding sequence of AGO (residue 630 to 632) 7 bases upstream of 
20 the initiation codon. The ORF codes for a polypeptide of 485 amino acid residues. The Alu 1(b) and 
Smal sites (see Figure 1), where mutations inhibiting msDNA synthesis were created, are located at 
amino acid residue - 12 and - 142 of the ORF, respectively or at the nucleotide sequence from residue - 
672 to -675, and from residue- 1061 to -1066, respectively (Figure 2). In Figure 2, nisd or the DNA 
sequence corresponding to the msDNA sequence is indicated by the closed box on the lower strand 
Si!^^^^o2 and the orientation is from right to left. Similarly, the msdRNA sequence (msr) is also indicated by 
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the closed box on the upper strand and the orientation is from left to right. The msd and msr regions 
overlap by 8 bases. An inverted repeat is also indicated by arrows with letters al and a2. This 
inverted repeat comprises a 34-base sequence immediately upstream of the branched G residue 
(residue 317 to 350; sequence a2 in Figure 2) and another 34-base sequence at the 3' end (residue 597 
5 to 564; sequence al). This inverted repeat is essential to form a stem structure which provides a stable 
secondary structure in a long primary transcript. This secondary structure is considered to serve as 
the primer as well as the template for msDNA synthesis (Dhundale et al., 1987; Hsu et ai., 1989). 



Sequence Similarity with Retroviral Reverse Transcriptases 

When the amino acid sequence of the ORF was compared with known proteins, a 
10 striking similarity was found between the sequence from Leu-308 to Ser-351 and retroviral reverse 
transcriptases (RT). In particular, this region contains the YXDD sequence, the highly conserved 
O sequence in all known RTs. This sequence (Tyr-344 to Asp-347) is boxed in Figure 2. In Figure 3, 

the ORF sequence of 266 amino acid residues from Ala- 170 to Lys-435 is compared with RTs from 
HIV (human immunodeficiency virus; Ratner et_aL, 1986) and HTLVl (human T- cell leukemia virus 
type 1; Seiki et al., 1983). As mentioned above, within the sequence of 44 amino residues from Leu- 
308 to Ser-351, there are 14 and 12 identical residues with HIV (32%) and HTLVl (27%), 
respectively. The entire RT domains of HIV and HTLV can also be aligned with the ORF sequence 
M from Ala- 170 to Lys-435, with much less similarity as shown in Figure 3. However, the same region 

was found to be extremely well aligned with the RT which was recently found in a clinical strain of 
Escherichia coli (Lampson et al., 1989b). This E, coH RT consists of 586 amino acid residues, and 
its amino terminal domain (residue-32 to -291) and the carboxyl terminal domain (residue-466 and - 
586) have been demonstrated to have sequence similarity with retroviral RT and ribonuclease H. This 
RT gene from E. coli was shown to be required for the production of rasDNA (msDNA-Ec67) and 
to have reverse transcriptase activity (Lampson et al., 1989b). Figure 3 shows that the sequence 
lAw?)^ similarity between E, coli and M. xanthus RTs is distributed within almost the entire RT region; in 

REISER & ASSOCUTES 
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in the region from Gly-226 to Giy-265, 19 out of 40 residues (48% similarity); in the region from 
Leu-308 to Ser-351, 26 out of 44 residues (59% similarity); and in the region from Lys-354 to Asn- 
408, 21 out of 55 residues (38% similarity). Overall, similarity from Ala- 170 to Lys-435 is 32% (85 
out of 266 residues are identical). In spite of these similarities, the M. xanthus ORF does not have 
5 the domain, which shows apparent sequence similarity with ribonuclease H (RNase H). The RNase 
H domain is found to be located in the carboxyl terminal region of the same polypeptide in which the 
RT domain exists in the amino terminal region in the case of the E. coli RT and other retroviral RTs. 
In the preceding paper, it was shown that there is a precise coupling between RT and RNase H 
activity (Lampson et al., 1989a). Therefore, RNase H may still reside with the ORF, or RNase H may 
10 be encoded by a separate gene. 



m Sequence Similarity with Other Proteins 



O 



In contrast to the E. coli RT and other retroviral RTs, the ORF found in M. xanthus 
has a long amino terminal extra domain consisting of approximately 170 residues. Interestingly, this 
^ region shows some sequence similarities with the carboxyl terminal region associated with integration 

!l.5 protein of Mo-MLV (Moloney murine leukemia virus; Shinnick et al., 1981) (see Figure 4A); the 

M 

^ sequence from Pro- 18 to Leu- 128 of the ORF shows 22% similarity (24 out of 111 residues) with the 



W region from Pro-1070 to Leu-1179 of the gag-pol polyprotein of Mo-MLV. It should be noted that 

SJ this region of Mo-MLV is unique for Mo-MLV integration protein and does not share sequence 

similarity with other retroviral endonucleases (Johnson et al., 1986). It is also interesting to notice 
20 that in Ty retrotransposon, this domain is located in front of the RT domain in contrast to the 
retroviral endonuclease domain (Clare and Farabaugh, 1985). 

As pointed out above, the ORF does not have homology to E. coli or retroviral RNase 
H. Instead, it has a short sequence of approximately 80 residues after the RT domain. In this region, 
one can also find sequence similarity with a part of the gag region of HIV. As shown in Figure 4B, 
iaw^ices sequence from Gly-411 to Glu-485 has 22 identical amino acid residues (31% similarity) with the 
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Requirement of Reverse Transcriptase 

The fact that disruption of the ORF significantly reduced msDNA production in M. 
xanthus (Dhundale et al., 1988a) and the fact that the ORF has sequence similarity with retroviral RTs 
strongly supports the previous hypothesis that RT is required for the synthesis of msDNA (Dhundale 
5 et al., 1987). Recently, we were able to demonstrate that msDNA is indeed synthesized by reverse 
transcriptase activity in a cell- free system (Lampson et al., 1989a). The fact that a small amount of 
msDNA (3% of the wild type level) is still produced in the ORF mutants (Dhundale et al., 1988a) is 
most likely due to another RT associated with smaller msDNA (msDNA-Mx65; previously assigned 
mrDNA; Dhundale et ^ ^ 1988b). In fact, an ORF has been found to be associated with the region 
10 responsible for msDNA-Mx65 production. 

At present it is unknown if the ORF is transcribed together with msdRNA from a 
m common upstream promoter or if the ORF has its own independent promoter. Previously, a major 

P RNA transcript of approximately 375 bases by SI mapping (Dhundale et al., 1987) was identified. 

S This transcript covers the region from approximately 75 bases upstream of msr (at around residue-256 

hi 

p[5 in Figure 2) to approximately 70 bases upstream of msd (at around residue-632 in Figure 2). This 

^ indicates that this RNA transcript ends at the ribosome binding site (AGG, 630-632) of the ORF. 

hi 

It is possible that the primary RNA transcript covers not only the msr-msd region but also the entire 
^ ORF. This transcript of approximately at least 2 kilobases (kb) is then used as the mRNA for the 

^ ORF to produce RT. At the same time, the 5' untranslated region of 350 bases forms a stable 

20 secondary structure which serves as a primer and a template for msDNA synthesis as previously 
proposed (Dhundale et al., 1987). Because of the secondary structure, the 5' end region is probably 
much more stable than the ORF mRNA region. As a result, only the 375 -base RNA from the 5' end 
of the transcript was detected in the previous work. In E. coli, the RT gene was shown to be 
transcribed from a single promoter for the msr region (Lampson et al., 1989b). 
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Evolution of Reverse Transcriptase 

All of the RTs so far identified are from eukaryotic origins, and associated with either 
retroviruses or retrotransponsons. DNA synthesis for retroviruses and transposition events for 
retrotransponsons occur via RNA which is used as a template for RTs (see review by Varmus, 1985). 
From amino acid similarity in various RTs, possible evolutionary relationships among these RTs has 
been proposed (Yuki et al., 1986). 

The present invention demonstrates that RTs are not specific to eukaryotes but exist 
in prokaryotes as well. An intriguing question arises as to the evolutionary relationship between 
prokaryotic and eukaryotic RTs and the origin of RT. In order to compare the amino acid sequences 
of these RTs, the sequence of the M- xanthus RT from Gly-304 to Leu-371 was chosen, since this 
^ sequence includes the YXDD box, the most conserved region among different RTs. In Figure 5A this 

® sequence is compared with 13 other representative RTs from bacteria, yeast, plant, mitochondrial 

O plasmid, and animal retroviruses. Within these 14 sequences, the D-D sequence (residues-346 and - 

O 347) is completely conserved, and both G-311 and Y-344 are also well conserved except for Ty-RT. 

Us Besides these residues, L-308, P-309, Q-310, S-315, P-316, L-330, S-351, and L.371 are fairly well 
O conserved among these sequences. On the basis of the numbers of identical amino acid residues, M- 

O xanthus RT has the following similarities with other RTs: 47% (32 amino acid residues) with E, coh 

CI - 1 RT; 41% (28) with R coli B RT; 24% (16) with HIV, BLV, and mitochondrial plasmid RTs; 22% 
(15) with Mo-MLV RT; 21% (14) with RSV, 17.6, gvpsv , and Tal-3 RTs; 19% (13) with HTLVl RT; 
20 15% (10) with Ty912 RT; and 9% (6) with Copia RT. On the basis of the phylogenetic relationships 
among RTs proposed by Yuki et al. (1986), and the present data, a dendrogram of homology of 
various RTs may be constructed as shown in Figure 5B. As proposed earlier (Yuki et al., 1986), 
modern RTs are composed to two major groups I and II. One group (group II) consists of 
retrotransponsons found in yeast (Ty912), plant (Tal-3), and Drosophila (Copia). Bacterial RTs seem 
25 to belong to the other group (group I) together with other retrotransponsons from Drosophila such as 
17.6 and gypsy , mitochondrial plasmid RT, and retroviral RTs. This indicates that both prokaryotic 
and eukaryotic RT genes were possibly derived from a single ancestral R I gene. 
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Origin of the M . xanthus Reverse Transcriptase 

In addition to the sequence similarity between the M- xanthus RT and RTs from 
retroviruses and retrotransponsons, msDNA shares other interesting similarities with retroviruses and 
retrotransponsons; msDNA (synthesis of single -stranded DNA) starts at a site 77 bases upstream of 
5 the RT gene and the orientation of DNA synthesis is opposite to the direction of translation of the 
RT gene. In the case of retroviruses and retrotransponsons, single -stranded DNA synthesis proceeds 
at the 5' -end untranslated region of an RNA molecule which serves as the mRNA for RT as well 
(Weiss et aL, 1985). The orientation of DNA synthesis is also opposite to the direction of translation 
of the RT gene. In the case of msDNA synthesis an RNA transcript itself serving as a template also 
10 serves as a primer by self -annealing to form a stable secondary structure (Dhundale et aL, 1987), 
^ whereas in the case of retroviruses and retrotransponsons tRNAs are recruited from the cell for the 

5 priming reaction. At present it is unknown if branched RNA-linked msDNA is the final product 

O of an unknown function or if it is a stable intermediate leading to other products. 

m 

O Furthermore, it is of great interest whether the M. xanthus RT is associated with a 

y 

complex such as virus- like particles such as those found for yeast Tyl element (Eichinger and Boeke, 
1988). In a preliminary experiment, msDNA of M- xanthus exists as a complex with proteins in the 
cell which sediments as a 22S particle. Characterization of this complex may shed light on questions 
n concerning the relationship between msDNA and retrocomponents as well as the functions of msDNA. 

At present, there is no information to support the possibility that msDNA may be a 
20 transposable element or an element associated with a provirus (or prophages). It is important to point 
out that the RT gene from M. xanthus appears to be as old as other genomic genes for the following 
reasons: (a) Nine independent natural isolates of M. xanthus from various sites (including Fiji Island 
and eight different sites in the United States) contained mutually hybridizable msDNA (Dhundale et 
aL, 1985). Since under the same hybridization condition, msDNA-Mxl62 did not hybridize with 
25 msDNA-Sal63 [which has extensive homology in both DNA and RNA sequences with msDNA- 
oPP«^s Mxl62; Dhundale et al., (1987)], the nine independent strains M. xanthus are assumed to contain 

i ASSOCIATES 

almost identical msDNA. (b) The codon usage of the Mx-162 RT is almost identical to those found 
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in other M. xanthus genes (Table 1). M. xanthus is known to have a very high G+C content (70%; 
Johnson and Ordal, 1968) and as a result, all the genes so far characterized have very high G+C 
contents at the third positions of codons used; 85.4% for vegA (Komano et al., 1987), 85.7% of 0£s 
(Inouye et al., 1983), 87.2% for t£s (Inouye et al, 1983), 88.4% for mbhA (Romeo et al., 1986), and 
5 93.9% for sigma factor. The average G+C content of the third positions is calculated to be 90.0% for 
these genes (Table 1). Surprisingly, the G+C content of the third positions of the RT codons is 
highest among these genes (95.5%; Table 1). 

In contrast, the E. coH msDNA system including the RT gene is considered to have 
been acquired much later in the evolution of E. coli. Reasons for this conclusion include: (a) Only 
10 four strains out of 89 independent clinical coli strains were found to produce msDNAs (Lampson 
et al., 1989b). (b) The codon usage of the coH RT is significantly different from the general codon 

KB", 

usage of E^ coli genes obtained from 199 R coli genes (Maruyama et al., 1986). In particular, out of 
62 arginine codons used in the E, coU RT, 40 (65%) use AGA or AGG in contrast to 2.7% for the 
AGA+AGG usage among all arginine codons in 199 E. coli genes (see Table 1). The AGA and AGG 
25 codons are the least used codons in E. coH (Maruyama et al., 1986). In addition to AGA and AGG 
L codons, many other codons, GCC and GCG for Ala, CGU and CGC for Arg, CAG for Gin, GGC and 

H GGA for Gly, CAC for His, AUG and AUA for He, UUA, CUU and CUG for Leu, UUC for Phe, 

ecu and CCG for Pro, UCG for Ser, ACC and ACA for Thr, and GUC for Val. (c) Although the 
E, coH msDNAs share little sequence homology, they all share the key secondary structures of a 
20 branched rG residue, aDNA-RNA hybrid at the 3' ends of the msDNA and msdRNA, and stem-and- 
loop structures in RNA and DNA strands (Lampson et al., 1989b; Lim and Maas, 1989). 

These results clearly demonstrate distinct differences between the msDNA systems of 
E. coH and ML xanthus . Myxobacteria are common organisms in soil and are found all over the world 
regardless of climate, and considered to diverge from their nearest bacterial relatives about 2x10^ 
25 years ago when the atmosphere became aerobic (see a review by Kaiser, 1986). Since it is reasonable 
to assume that the M- xanthus RT gene is as old as other genomic genes, the RT gene existed much 
S^^^sTw before eukaryotic cells appeared (1.5-0.9 x 10^ years ago). The relatedness between various 
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prokaryotic and eukaryotic RTs as shown in Figures 5A and B strongly supports the existence of a 
single ancestral gene for all RTs. It is possible that such an ancestral RT gene was independently 
recruited into different systems such as the msDNA system, the retrotransposon system, and the 
retroviral system. Alternatively, the msDNA system may be a primitive ancestral system from which 
retrotransposons and retroviruses originated. In this regard, it is intriguing to point out other 
sequence similarities between the M. xanthus RT-ORF and other retroelements (see Figure 4) other 
than RT itself as well as the similar mode of initiation of DNA synthesis by RT as discussed earlier. 

At present, it is beyond our speculation why the E. coli msDNA systems are so 
diverged in contrast to the M. xanthus msDNA system and how they were acquired into the genomes 
of some coU strains. However, it should be noted that the coli RTs are most related to the M- 
xanthus RT indicating that they were not derived from eukaryotic origins. Possible origins of 
retroviruses have been discussed (Temin, 1980). The recent finding of an imposon in a genetic 
component for a mouse gene also raises an interesting question concerning the evolution of 
O retroelements (Stavenhagen and Robins, 1988). Further characterization of the prokaryotic RTs and 

the msDNA systems will provide clues to the origins of RT and other retroelements. 
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EXPERIMENTAL PROCEDURE 



DNA Manipulation and Plasmids 

DNA manipulation was performed as described by Maniatis et aL (1982), The plasmid 
isolation was as originally described by Birnboim and Dolly (1979). Plasmid prnsSB? containing the 
20 5 kb Sall-BamHI fragment shown between the Sail and BamHI sites of pUC9 (Vieira and Messing, 
1982) was used. After the 2.2 kb Sall-Smal fragment from pmsSB7 was subcloned between the Sail 
and Smal sites of pUC9, all Rsal fragments were gel- purified and cloned into pUC9 for DNA 
sequence. 
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DNA sequence 

DNA sequence was determined by the chain termination method (Sanger et_aL, 1977) 
using single -stranded or double -stranded DNA as templates with synthetic oligonucleotides. 
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Other Material and Methods 

Restriction enzymes were purchased from either Bethesda Research Laboratories or 
New England BioLabs. [a-^^S] dATP was from Amersham. Sequenase, Version 2.0 Kit was 
purchased from United States Biochemical Corporation for DNA sequences. 

Cyborg program from International Biotechnologies Inc. was used to search sequence 

homology in GenBank Release 55. 

Screening of bacteria for retron synthesized msDNAs was performed by the methods 
of Lampson M^LBacteriol, 173:5363-5370 (1991), or Yee et al, CeH, 38, 203-209 (1984). 

RTs were identified and isolated by the method of Lampson et al, J. Biol. Chem, 

265:8490-8496. 



msDNA in Escherichia coli 


The recent serendipitous finding of msDNA (msDNA-Ec86) in E. coH B by Dongbin 

t H 

W Lim and Werner Maas (D. Lim et al., 1989) prompted a to search for msDNA in other E. coH strains. 

H Previously established by Yee et al. (T. Yee et d., 1984), msDNA is not found in the common 

laboratory strain K12, however, to our surprise, it was in a clinical E. coH strain isolated from a 
patient with a urinary tract infection. Fifty independent E. coU urinary tract isolates were examined 
for the presence of msDNA (The clinical E. coH strains were urinary tract isolates kindly provided 
by Dr. Melvin Weinstein from the microbiology laboratory, R.W. Johnson Hospital, New Brunswick, 
NJ. The clinical strain Cl-1 was identified using the API-20E identification system (API laboratory 
products) and gave a typical E. coH profile number of 5044552.). The screening method involved 
treatment of total RNA prepared from each strain with (AMV) RT in the presence of [a-^^pjjcTP 
plus dATP, dTTP, and dGTP followed by polyacrylamide gel electrophoresis. Since msDNA contains 
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a DNA-RNA duplex structure, the 3' end of the DNA molecule serves as an intramolecular primer 
and the RNA molecule as a template for RT. When RNA prepared from one of the clinical strains, 
E. coH Cl-1, was labeled in this manner, two distinct, low molecular weight bands of about 160 bases 
became labeled with ^^P and are shown in Figure 6. If the labeled sample is digested with 
ribonuclease (RNase) A prior to loading on the gel, a single band corresponding to 105 bases of 
single -stranded DNA is detected (lane 4). This indicates that both bands in lane 3 contain a single - 
stranded DNA of identical size. The two labeled bands observed prior to RNase treatment (lane 3) 
are due to two species of msDNA comprised of a single species of single-stranded DNA linked to 
RNA molecules of two different sizes. RNA molecules of two different sizes have been observed at 
the 5' ends of msDNA from myxobacteria in which a precursor molecule contains a longer RNA 
which is processed into a smaller mature form (Dhundale et al., 1987; Furuichi et al., 1987). Among 
the 89 clinical isolates screened, three other strains produced msDNA-like molecules of varying size 
and quantity, suggesting extensive diversity among these molecules. As previously reported 

(Dhundale, 1985), msDNA was not observed in the E. coli K-12 strain, C600 (lanes 1 and 2, Figure 

Ul 

Us 6). 
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Nucleotide sequence of msDNA Ec-67 

To determine the base sequence of the DNA molecule, the RNA-DNA complex 
isolated from the clinical strain was labeled at the 3' end of the DNA molecule with AMV-RT and 
[a-^^P]dATF. By adding dideoxy-CTP, ddTTP, and ddGTP to the reaction mixture, a single labeled 
20 adenine is added to the 3' end of the DNA molecule. RNA is removed with RNase A+ Tl and the 
end-labeled DNA is subjected to the Maxam and Gilbert sequencing method (Maxam et ad., 1980). 
Figure 7 shows that msDNA consists of a single -stranded DNA of 67 bases and, as in the case of 
msDNAs from myxobacteria (Yee, 1984; Dhundale, 1987), it can form a secondary hair-pin structure. 
The primary sequence, however, is not homologous to any of the myxobacterial msDNAs, nor to the 
LAw?^s msDNA from E. coH B (msDNA-Ec86; Lim and Maas, personal communication). 
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The sequence of the RNA molecule was determined using the RNA-DNA complex 
purified from E. coli Cl-1. The RNA sequence was determined using base specific RNases as 
described previously (Dhundale et d., 1988). As shown in Figure 8, a large gap is observed in the 
RNA sequence "ladder". This gap is due to the DNA strand branched at the 2' position of the 15th 
rG residue of the RNA strand which produces a shift in mobility of the sequence ladder (see Figure 
7). The RNA consists of 58 bases with the DNA molecule branched at the G residue at position 15 
by a 2\5'-phosphodiester linkage. The branched G structure was determined as previously described 
for msDNAs from myxobacteria (Dhundale, 1987; Furuichi et al., 1987). After RNase (A and Tl) 
treatment, msDNA retains a small oligoribonucleotide linked to the 5* end of the DNA molecule due 
to the inability of RNases to cleave in the vicinity of the branched linkage. The 5' end was labeled 
with pr-^^P]ATP using T4 polynucleotide kinase and the labeled RNA molecule was detached from 

5 J 

the DNA strand by a debranching enzyme purified from HeLa cells (Ruskin et al. 1985; Arenas et al., 
O 1987; the debranching enzyme was a gift from Jerard Hurwitz). This small RNA was found to be a 

p tetraribonucleotide which could be digested with RNase Tl to yield a labeled dinucleotide (not 

^5 shown). Since RNase Tl could not cleave the RNA molecule at the G residue before debranching 
enzyme treatment, it was concluded that the single -stranded DNA is branched at the G residue via 
~ a 2',5'-phosphodiester linkage. In addition, partial RNase U2 digestion cleaved the RNA molecule 

to yield a ^-P-labeled mono- and a ^^P-labeled trinucleotide (not shown). Thus, the sequence of the 
tetranucleotide is ^'A-G-A-(U or C)^'. Based on these data, the complete structure of msDNA-Ec67 
from E. coH Cl-1 is presented in Figure 7. Despite a lack of primary structural homology, msDNA- 
Ec67 displays all the unique features found in msDNAs from myxobacteria. These include a single - 
stranded DNA with a stem-and-loop structure, a single -stranded RNA with a stem-and-loop 
structure, a 2',5'-phosphodiester linkage between the RNA and DNA, and a DNA-RNA hybrid at 
their 3' ends. This hybrid structure was confirmed by demonstrating sensitivity of the RNA molecule 
25 to RNase H (not shown). 
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Cloning of the locus for msDNA-Ec67 

In order to identify the DNA fragment which is responsible for msDNA synthesis in 
E. coli Cl-1, Southern blot hybridization was carried out with various restriction enzyme digests of 
total chromosomal DNA prepared from E. coH CM, using msDNA-Ec67 labeled with AMV-RT (the 
same preparation as shown in lane 3, Figure 6) as a probe. The result is shown in Figure 9A. EcoRI 
(lane 1), Hindlll (lane 2), BamHI (lane 3), PstI (lane 4) and Bgllll (lane 5) digestions showed single 
band hybridization signals corresponding to 11.6, 2.0, ,22, 2.8 and 2.5 kilobase pairs (kb), 
respectively. The upper band appearing in the EcoRI digestion is due to incomplete digestion of the 
chromosomal DNA. Analysis of total chromosomal DNA prepared from E. coli Cl-1 by agarose gel 
electrophoresis revealed that the strain contains two plasmids of different size. However, neither 
plasmid hybridized with the ^^P- labeled probe, indicating that the fragments detected in Figure 9A 
are derived from chromosomal DNA. Furthermore, there is only one location for the msDNA-coding 
region on the chromosome, since various restriction enzyme digestions gave only one band of varying 
sizes. Similar results were observed for the msDNAs of myxobacteria (Yee et al., 1984; Furuichi et 
^5 al., 1987; and Dhundale et al., 1988). 

The 11.6-kb EcoRI fragment and the 2.8-kb P^I fragment were each cloned into 
pUC9 (Yanisch-Perron et al., 1985) and E. coli CL83 (a recA transductant of strain JM83), an 
msDNA-free K-12 strain (lane 1, Figure 9B), was transformed with the plasmids. Cells transformed 
-■^ with the 11.6-kb EcoRI clone (pCl- IE) were found to produce msDNA (lane 2, Figure 9B), whereas 

20 cells transformed with the 2.8-kb PstI clone (pCl-lP) failed to produce any detectable msDNA (lane 
3, Figure 9B). A map of the 11.6-kb fragment is shown in Figure 10. Southern blot analysis of the 
fragment revealed that a 1.8- kb PstI - Hindlll fragment hybridized with the msDNA probe. When 
the DNA sequence of this fragment was determined, a region identical to the sequence of the msDNA 
molecule was discovered. The DNA sequence corresponding to the sequence of msDNA is indicated 
25 by the enclosed box on the lower strand in Figure 11 and the orientation is from right to left. The 
location of this sequence is also indicated by a small arrow in Figure 10. As is the case for all other 
SS^S^l^ii2 known myxobacterial msDNAs (Dhundale et al-, 1987; Furuichi et al., 1987; and Dhundale et al-, 
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1988), a sequence identical to that of the RNA linked to msDNA (see Figure 7) was found 
downstream of the msDNA- coding region in opposite orientation and overlapping with that region 
by 7 bases. This sequence is indicated by the enclosed box on the upper strand in Figure 11 and the 
branched G residue is circled. Again, as in all the msDNAs found in myxobacteria, there is an 

5 inverted repeat comprised of a 13 -base sequence immediately upstream of the branched G residue 
(residue 250 to 262; sequence a2 in Figure 11) and a sequence at the 3' end shown by an arrow in 
Figure 11 (residue 368 to 380; sequence al). As a result of this inverted repeat, a putative longer 
primary RNA transcript beginning upstream of the RNA coding region and extending through the 
msDNA coding region would be able to self- anneal and form a stable secondary structure, which is 

10 proposed to serve as the primer as well as the template for msDNA synthesis (Dhundale et^, 1987). 

o 

iB Existence of an essential gene for msDNA svnthesis 

Q The 2.8 -kb PstI fragment (from Pstl(a) to Pstl(b) in Figure 10) was not able to 

synthesize msDNA. However, an overlapping 3.9- kb fragment from Ball (1-0 kb downstream of 
Pstl(a); see Figure 10) to the following EcoRI site contains all the information required for synthesis 
of msDNA. This indicates that a region downstream of the Pstl(b) site (Figure 10) is required for 
msDNA production. The nucleotide base sequence from this region revealed a long open reading 
^ frame (ORF) of 586 amino acid residues, starting with the initiation codon ATG at nucleotide 418 

to 420 as shown in Figure 11. A distance of only 51 bases separates the initiation codon from the 
region which encodes msDNA. A putative Shine-Dalgarno sequence (GGA) can be found 10 bases 
20 upstream of the initiation codon. When the lacZ gene was fused in frame at the Hindlll site (within 
the ORF) at amino acid residue- 126, p-galactosidase activity was detected (not shown). Thus the 
region encompassing the ORF is indeed transcribed and the gene product encoded by the ORF is 
essential for msDNA synthesis. In a preliminary experiment, both msdRNA and the ORF appeared 
to be transcribed as the same transcription unit, since a deletion mutation removing the sequence from 
residue 1 to 181 blocked the expression of the lacZ gene fused at the Hindlll site. A putative 

6 ASSOCIATES 

^S^^i promoter can be found in the deleted sequence as boxed in Figure 11. These -35 and -10 regions 
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probably serve as the promoter for both msdRNA synthesis and the ORF. 
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Sequence similarity with retroviral reverse transcriptases 

When the amino acid sequence of the ORF was compared with known proteins, a 
striking similarity was found with retroviral RTs. In Figure 12, the ORF is compared with RTs from 
5 HIV (human immunodeficiency virus; Ratner et al., 1985; and Johnson et d., 1986), and HTLVl 
(human T-cell leukemia virus type I; Seiki et al , 1983; and Patarca et ai., 1984). The first domain 
(Asn-32 to Val-291) matches well with the RT domains of HIV and HTLVl. In particular, the 
sequences around the polymerase consensus "Asp- Asp" sequence (Toh et al., 1983; and Geng et al., 
1985; boxed in Figures 11 and 12) are well conserved. Out of 260 amino acid residues in this domain, 
10 44 and 38 residues are identical with HIV and HTLVl, respectively. Between HIV-RT and HTLVl- 
m there are 78 identical amino acid residues in this domain. 

2 The Eol gene of retroviruses is known to produce a protein consisting of RT and 

S RNase H activities; the former at the amino- terminal and the latter at the carboxyl- terminal region 

of the £ol gene product (Ratner et al., 1985; Johnson et al., 1986; Varmus, 1985; and Tanese et al^ 



115 1988). These domains have been shown to be separated by a poorly conserved "tether" domain of 



U 



y approximately 160 to 190 amino acid residues (Ratner etal., 1985; Johnson ctal-, 1986). On the basis 

W of the HIV sequence, the similarities (only identical amino acid residues) between HIV and HTLVl 

are 29.5 and 16.8% for the RT domain and the tether domain, respectively. The similarities between 
HIV and msDNA are 16.9 and 10.3% for the RT domain and the tether domain, respectively. The 
20 similarities between HTLVl and msDNA are 14.6 and 15.5% for the RT domain and the tether 
domain, respectively. These results indicate that in addition to the RT region, there are reasonable 
similarities in the tether domain between retroviruses and msDNA. An alignment of the RNase H 
domains also revealed that there are similarities between retroviruses and msDNA (15.7 and 17.4% 
with HIV and HTLV, respectively; see Figure 12). The similarity between HIV and HTLVl in this 



region is 18.0%. 
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Cell extracts were prepared and assayed for the presence of RT activity associated with 
the production of msDNA as predicted from the amino acid homologies. Only the E. coli strain 
(C2110, polA) (Tanese et al.. 1985; Tanese et_aL, 1986; E. coli strain C2110 (polAl") was a gift from 
M. Roth and S. Goff) harboring the plasmid, pCl-lEP5, containing the msDNA ORF displayed RT 
activity (Figure 13). The polA strain was used to eliminate high background activity in the RT assay 
due to DNA polymerase I. No RT activity was detected in extracts containing the vector plasmid 
alone, or when the template- primer (poly rC-dG) was absent from the reaction mix (Figure 13). It 
is interesting to note that the Pstl(b) site is located at amino acid residue -430, which is between the 
tether domain and the RNase H domain. A plasmid lacking sequences downstream of the Pstl(b) site 
did not produce msDNA. This suggests that the RNase H domain may be essential for msDNA 
synthesis, or alternatively that PstI disruption may result in inactivation of RT. 

In addition to the similarity between msDNA-Ec67 RT and retroviral RT, there is an 
interesting similarity between msDNA and retroviruses; DNA synthesis starts at a site upstream of 
the RT-RNase H gene, and the orientation of DNA synthesis is opposite to the direction of 
transcription of the RT-RNase H gene. In the case of retroviruses, tRNAs are recruited from the cell 
for the priming reaction (Weiss et al., 1985), whereas for msDNA an RNA transcript serving as, 
template also serves as a primer by self -annealing to form a stable secondary structure (Dhundale et 
al., 1987; Furuichi et al., 1987). 



Origin of the E. coli Reverse Transcriptase 
20 At present the relationship between msDNA and retroviruses is an open question. It 

is possible that the study of msDNA may shed light on the question of the origin and evolution of 
retroviruses. It is an intriguing question to consider why some of the clinical E. coli strains, isolated 
from human patients produce msDNA. Our preliminary data indicate that msDNAs produced by four 
independent E. coH strains, isolated from urinary track infections, share little homology. This 
iaw%5k:es suggests that there may be enormously large numbers of species of msDNA in E. coli. In contrast to 
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msDNAs found in E. coli, msDNA-Mxl62 from M. xanthus is highly conserved, since nine 
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independent M. xanthus strains isolated from various sites have msDNA which hybridizes with the 
original msDNA-Mxl62(Dhundaleetal., 1985). Furthermore, msDNA from another myxobacterium, 
S. aurantiaca (msDNA-Sal63; Furuichi et ai., 1987), also shows a high degree of homology to 
msDNA-Mxl62 (Furuichi et al., 1987). 
5 Several lines of evidence suggest that the RT gene found in the E. coli strain Cl-1 is 

not likely to have originated in E. coli, but rather was recently acquired from some other source. For 
example, only about 4% of E. coli strains tested were found to produce msDNA. In addition, the RT 
gene from strain Cl-1 does not cross hybridize to chromosomal DNA from four other E. coli strains 
which produce msDNA molecules, indicating that there is extensive diversityjim^ong these RT 
10 In contrast, a DNA fragment from the E. coli - K - 12 sigma factor gene can hybridize to chromosomal 
DNA from all five msDNA producing, E, coU strains, indicating the conserved nature of sigma 
W factors. An analysis of the E. coli RT gene indicates that the codon usage for this gene is remarkably 

O different from most E. coli proteins. In particular, AGA and AGG, the least frequently (2.7%) used 

P codons for arginine among 199 E. coli genes (Maruyama et al., 1986), occurs at a frequency of 64.5% 

^5 in the E. coli RT gene. Similarly, CUG is the most commonly used codon for leucine (61.3%; 
^ Maruyama et al., 1986) in E. coH genes, while its prevalence in the RT gene is only 9.1%. The AT 

S base pair content of the E. coli RT gene was calculated to be 67.6%, which is substantially higher than 

y the AT content of the E. cgH genome (45%; Fasman, 1976). The AT contents of HIV and HTLVl RT 

genes are 62.1% and 47.8%, respectively. These facts pose an intriguing question as to how and when 
20 the RT gene, as well as the msDNA coding region, were integrated into the genome of the clinical 
strain. 

There are many questions to be answered, including (a) are there any particles 
associated with msDNA, (b) is the msDNA region transposable like the Ty element of yeast (Boeke 
et al., 1985; Eichinger et al., 1988), (c) can the element responsible for the production of msDNA be 
25 transferred from cell to cell, (d) can a RT from one strain (E. coli or myxobacteria) complement the 
production of msDNA of other strains, (e) does the promoter for the RNA transcript have any 
similarities to the retroviral LTR, (f) are there any specific integration sites for the msDNA element 
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on the E. coli chromosome, (g) why is the branched G residue conserved, (h) is there an enzyme 
responsible for priming DNA synthesis at the 2' -OH position of the rG residue, (i) why and how does 
msDNA synthesis stop at one distinct site on the RNA template, and Q) how different biochemically 
are the msDNA RTs from retroviral RTs? 
5 The existence of reverse transcriptase in prokaryotes, previously speculated upon 

(Dhundale et al., 1987), is now evident. This fact raises intriguing questions concerning possible roles 
of this enzyme in the prokaryotes other than a role in msDNA production. Recently we also found 
that M. xanthus . in which msDNA was originally discovered, has a long ORF in the same manner as 
found for msDNA-Ec67. This ORF has a high degree of similarity to the E. coH RT. Since eight 
10 independent isolates of M. xanthus produce homologous msDNA, the M. xanthus RT is likely to have 
_ been acquired at a very early stage of its evolution in contrast to the E. coli RT. The determination 
^ of the structures of both M- xanthus and other E. coli RTs will shed light on the key question of the 

origin of RT and its role in prokaryotes. 

An important embodiment of the invention relates to the discovery of msDNA- 
^5 producing retron elements in a number of diverse bacterial groups. Thus, retron elements appear to 
Q be widely prevalent, at least amongst the purple bacteria or proteobacteria including Proteus, 

S Klebsiella and Salmonella of the gamma subdivision; Rhizobium and Bradyrhizobium from the alpha 

subdivision; and Nannocystis (a myxobacterium) from the delta subdivisions. These are 
representatives of the three of the four major subdivisions of the purple bacteria of proteobacteria. 
20 As shown above the retron -encoded RT is responsible for the synthesis of msDNAs. 

The retron elements were discovered by detecting the presence of msDNA by one of 
two classic methods: the so-called "RT extension method", described by Lampson, B.C., M. Inouye 
and S. Inouye, 1991. Survey of multicopy single- stranded D NAs and reverse transcriptase ^enes 
among natural isolates of Mvxococcus xanthus . J. Bacteriol. 173:5363-5370 and in Lampson, B.C., 
25 M. Viswanathan, M. Inouye and S. Inouye, 1990. Reverse transcriptase from Escherichia coli exists 
as a complex with msDNA and is able to synthesize double -stranded DNA. J. Biol, Chem. 265:8490- 
8496 or polyacrylamide gel electrophoresis of a chromosomal DNA extract followed by staining with 



~4 



lAWOmCES 
WEISER & ASSOCUTES 
SUTTE 500 
230 SO. FIFTEENTH ST. 
PHILADEIPHIA. PA Ifltffi 
(215)6754383 
Ta^COPlER (215) 67&«394 

-32 



^^^^ PAIBNT 
ethidium bromide as described by Yee, T., T. Furuichi, S. Inouye, 1984. Multicopy Single-Stranded 
DNA Isolated from a Gram -Negative Bacterium. Mvxococcus xanthus . Cell, Vol. 38, 203-209. Both 
of these publications are incorporated herein by reference. Both methods provide a reliable, 
convenient and conventional protocol for screening of bacteria for the presence of retron- encoded 

5 RT and msDNAs. 

In accordance with the RT extension method, the DNA portion of msDNA is 
specifically ^^P radio labeled. Radio labeled from a total RNA preparation extracted from each 
bacteria strain to be screened. Twenty or more isolates of proteus mirabilia, Klebsiella pneumoniae. 
Salmonella species, rhizobial species, and enterococcal species were screened by this method. Low- 
10 molecular- weight bands (Fig. 20) indicated the presence of small labeled DNAs after polyacrylamide 
gel electrophoresis and autoradiography of the labeling reaction mixes. In addition, half of each 
5 labeling reaction mix was also treated with RNase A, causing a shift to a faster -migrating band, 

O indicating that the labeled DNA is also associated with RNA. This is hallmark of the msDNA 

m 

O molecule as discussed above. Four of the 23 P. mirabilia isolates screened produced msDNA, while 

ill 

pLs only 1 of 21 K. pneumoniae isolates and 4 of 70 Salmonella isolates screened produced msDNA. 
h msDNA was detected in any of the 30 or so enterococcal strains screened by this method. It was 

S concluded that the bacterial genera which contain msDNA producing retron elements are 

representatives of three of the four major subdivision of the purple bacteria or Proteobacteria, as 



S " " 



^ described above. 



20 



In accordance with this embodiment of the invention, it is noteworthy that the 
discovery of msDNA extends for the first time the distribution of retron -elements to a new 
phylogenetic division of the purple bacteria, namely, the alpha subdivision. A collection of 63 
rhizobial isolates (shown in Table 1) were screened for the presence of msDNA by the RT extension 
method. Among the 63 isolates, msDNA were detected in 10 (16% - Fig. 20 and Fig. 21). However, 
25 all 10 positive isolates give strong, clearly labeled bands with a typical shaft of a fast- migrating band 
u^w™ after treatment with RNase A, indicating the presence of RNA and DNA in the labeled molecule. 
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The 10 retron -encoding rhizobial strains include both fast growing (rhizobium) and slow-growing 
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(Bradyrhizobium) rhizobia. 

The RT extension method comprises treating a preparation of total RNA, extracted 
from a bacterial strain to be tested, with RT from a suitable source in the presence of the 
deoxynucleotides dATP, dTTP, dGTP and dCTP, one of which is radiolabeled, e.g., [a-^^P] dCTP, 
5 electrophoresing the treated RNA preparation on a polyacrylamide gel and determining initially the 
presence or absence of msDNA in the bacterium of interest by detecting a band of radiolabeled DNA 
corresponding to the single -stranded DNA of msDNA. Typical examples of suitable sources of RT 
are avian myeloblastosis virus (AMV) and Moloney murine leukemia virus (Mo- MLV). Conceivably, 
the test could be automated. 
10 Total RNA samples, which contain msDNA if present in the bacterium, are extracted 

from the bacterial strain of interest and prepared for RT extension as follows. Total RNA, prepared 
from a 5-ml culture from the bacterial strain, is added to 50 }ul of a reaction mixture containing: 50 
mM tris-HCl (pH 8.3); 6 mM MgCl2; 40 mM KCl; 5 mM DTT; 1 j^m dATP, dTTP and dGTP; 0,04 
p ]jM dCTP; 0.2 ]jM [a ^^P] dCTP; and 10 units of AMV-RT (Boehringer Mannheim). The reaction 

^ mixture is incubated at 37^C for 30 minutes, then extracted with 50 jul of phenolchloroform (1:1) and 
precipitated with ethanol. The samples are subjected to electrophoresis on a 4% acrylamide -8 M urea 
gel with appropriate nucleotide size markers, e.g., the Klenow fragment of DNA polymerase 1. If the 
^ labeled sample is digested with ribonuclease (RNase) A before it is placed on the gel, a single band 

corresponding to single -stranded DNA is detected, which is indicative of the presence of msDNA. 
20 An aliquot from each labeling reaction mixture is treated with 5 pg of RNase for 10 minutes at 37^C 
just prior to electrophoresis to detect in the gel a shift to a faster - migrating species, indicating that 
each labeled DNA is also associated with RNA, which is the hallmark of the msDNA molecule. 

Low -molecular weight bands in the gel indicate the presence of small labeled DNAs 
after polyacrylamide gel electrophoresis and autoradiography of the labeling reaction mixtures. 
25 Multiple bands observed in some of the lanes of the gel even after RNase treatment 

wwoFRCEs may be due to incomplete extension by RT during the labeling reaction, or, alternatively, multiple 
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The Yee method for screening bacteria for the presence of retrons which synthesize 
msDNAs involves purifying by a conventional phenol extraction procedure total chromosomal DNA 
from the desired bacteria to be screened, electrophoresis on a five percent preparation acrymalide gel 
and checking for a satellite band. The major satellite band is cut out to extract the material in the 
5 band to quantitate the material in the satellite band. Total chromosomal DNA is subjected to 
acrylamide gel electrophoresis, the gel is stained with a ethidium bromide and densitometric scanning 
is employed to quantitate the satellite DNA against the pBR322 standard. The method is described 
in better details in Yee cited above. 

A collection of rhizobial isolates from the United States Department of Agriculture 
10 (USDA) Beltsville Rhizobium Culture Collection are screened for the presence of msDNA by the RT 

_ extension method. This collection represents isolates at different times, from different legume hosts 

P 

K and from different geographic locations. msDNAs are detected in 10 isolates. All 10 positive isolates 

m 

O give strong, clearly labeled bands of DNA, with a typical shift to a fast- migrating band after 

p treatment with RNase A, indicating the presence of RNA and DNA in the labeled molecule. The 10 

.3 - 5 

25 retron- encoding rhizobial strains include both fast-growing (Rhizobium ) and slow-growing 
fBradvrhizobium) rhizobia as follows: Rhizobium sp. (Acacia ) 3002 and 3838, Bradvrhizobium sp. 
(Aeschvnomene) 3516, Bradvrhizobium sp. (Albizia) 3004, Bradvrhizobium sp. (Ervthrima ) 3242, 
% Rhizobium loti 3468 and 3503, Rhizobium trifolii 2048 and 2065 and Bradvrhizobium sp. (Vigna) 

^ 3447. See Figure 21 

20 Total DNA from each of eight msDN A - producing strains clearly cross - hybridizes with 

a nod YAB (1.6 - kb Eco RI fragment) gene probe derived from Bradvrhizobium japonicum , 
confirming that these strains are members of the Rhizobiaceae . 

In view of the diversity of retron elements in prokaryotic populations, it is not 
excluded that msDNA synthesizing retrons would be found in bacteria living in alkaline 
25 environments, such as in alkaline environments: Plectonema nostocorum , Flavobacterium s^fi. 

Agrobacterium spp . Bacillus spp . Ectothiorhodospira spp.; in acidic environments: Thiobacillus 
thermophilica and thiooxidans , Thermoplasma acidophilus , Sulfolobus acidocaldarius . Cuanidium 
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caldarius . Bacillus acidocaldarius : in very high temperature environment (thermophilic): Sulfolobus 
acaidocaldarius . Caldariella acidophila , Thermusaauaticus : in very low temperature (psychrotrophic): 
Vibrio marinus, Fseudomonas spp., Cvtophaga spp .. Flavobacterium s££.; in high salt environments 
(halophilic): Halobacterium cutirubrum and salinarium , Halococcus morrhuae . Danaliella yiridis; in 
high barometric pressure (like deep sea - barophilic), which are believed to inhibit the gut of ocean 
bottom dwelling fish. By using one of the two screening tests identified above, one skilled in the art 
will readily determine whether any one of these bacteria contain retrons synthesizing msDNA. This 
may be particularly interesting for making evolutionary comparisons between homologous RT genes 
present in distantly related phytogenic strains. 

A representative number of amino acid sequences of representative RTs were analyzed 
to determine similarities and differences. The following observations were made. The amino acid 

5 sequences of these bacterial RTs are shown in Figure 14. The individual nucleotide and amino acid 

P 

p sequences for each of the RTs are shown in Figures 2, 11 and 15 through 19. 

2 From a comparison of these sequences, it is noted that there are 61 conserved positions 

in the RT domains as indicated by solid dots at the bottom of the sequences in Figure 14. It is further 
noted that all bacterial RTs possess the YXDD sequence. Several other residues are conserved 
including the LPQS sequence that is especially common in retroviral reverse transcriptases. The RT 
domains are divided into seven subdomains. For each subdomain, the consensus sequences for the 
ven bacterial RTs can be established, as shown at the bottom of the sequences in Figure 14. There 
20 are 18 extra residues (except 26 residues for RT-Ec67) between subdomains 2 and 3, in which there 
is a reasonably good consensus sequence. 

It has been noted that the RTs of the present invention possess a number of common 
conserved sequences of nucleotides and amino acid residues. 

The most common conserved sequence of amino acid residues noted is as follows: 
25 tyrosine, alanine or cysteine ^"^J^^^^P^^^ ^^'^ ^^^^^^J^^^^^^^^ se^^^common to all 
^^Q,r^<^ RTs of the present itfvention, is also known as the YXD^equenca. 
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A second conserved sequence of amino acid residues noted is as follows: serine, x 
which is a hydrophobic residue selected from the group consisting of valine, phenylalanine leucine 
and isoleucine, which is a polar residue selected from the group consisting of threonine, asparagine, 
lysine and serine and X2 which is a hydrophobic Residue selected^om the group consisting of 
tryptophan, phenylalanine and alanin^ ^ 

A third conserved sequence of amino acid residues noted is as follows: asparagine, x 
which is a hydrophobic residue selected from the group consisting of alanine, leucine and 




phenylalanine and which is a hydrophobic residue selected from the group consisting of leucine, 
valine an^isoleuci^f^ 

10 A fourth conserved sequence of amino acid residues further noted is as follows: x 

which is a polar residue selected from the group consisting of arginine, glutamic acid, lysine, valine 

B 

m and glutamine, a second residue which is valine, a riiird residue which is threoni£e and a fourth 

residue whicTh is glycin^ 

These conserved sequences are only a portion of the total number of common 
sequences of the RTs. For other conserved sequences held in common by the bacterial RTs reference 
is made to Figure 14. 

The RTs of the other groups of bacteria described herein as capable of synthesizing 
msDNAs are likewise believed to have a similar profile of conserved nucleic acid and amino acid 
residue sequence similarities as shown in Figure 14 and discussed above. This observation also applies 

20 to the genus Nannocvstis . 

In accordance with the invention, it is contemplated that prokaryotic reverse 
transcriptase, which is essential for msDNA synthesis, may be responsible for host cell parasitic or 
selfish DNA synthesis. Additionally, it is thought that the prokaryotic reverse transcriptase molecule 
may be essential for synthesis of biological messengers and nucleic acid enzymes. 
25 The msDNAs synthesized by the reverse transcriptase disclosed herein possess a highly 

tAwoFTCEs Stable RNA; it is capable of self -annealing and may serve as the primer and template for msDNA 

kVeiSER & ASSOCIATES 

.SS^S^^i synthesis. The reverse transcriptases (RTs) disclosed herein may be used as diagnostic agents. It is 
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also contemplated that the RTs of the invention can synthesize msDNAs which will contain specific 
selected DNA fragments that can hybridize with complementary ssDNA, or otherwise identify 
ssDNAs, sought for, thus being useful as probes. 

The possibility for the msDNAs to behave like restriction enzymes (or have restriction- 
5 like enzyme activity) in being capable of cleaving DNAs, or cut off a segment of itself, cannot be 
excluded. 

The following examples are provided for purposes of illustration only and are not to 
be viewed as a limitation of the scope of the invention. The following examples are illustrative of 
bacterial isolates screened and identified to contain msDNA by way of the present invention. 



10 EXAMPLE 1 

Q 



One of the rhizobial strains. Rhizobium trifolii USDA 2065 is identified as containing 
msDNA by the RT extension method by which msDNA from total RNA is specifically labeled with 
^^P as follows. 

Total RNA from a 5 - ml culture of R. trifolii 2065 is added to a 50 jul reaction mixture 
% containing: 50 mM tris-HCl (pH 8.3); 6 mM Mg C^; 40 mM KCl; 5 mM DTT; 1 jum dATP, dTTP 
W and dGTP; 0.04 ^Md CTP; 0.2 jjM [a^^P] dCTP; and 10 units of AMV-RT (Boehringer Mannheim). 

The reaction mixture is incubated at ST^C for 30 minutes, then extracted with 50 ^1 of 
phenolchloroform (1:1) and precipitated with ethanol. The samples are subjected to electrophoresis 
on a 4% acrylamide-8 M urea gel with appropriate nucleotide size markers, such as the Msp I digest 
of pBR322 end-labeled with [a-^^V] dCTP and the Klenow fragment of DNA polymerase I. An 
aliquot of the reaction mixture containing R. trifolii RNA is treated with 5 pg of RNase for 10 
minutes at 37°C prior to electrophoresis to detect in the gel a shift to a faster- migrating species, 
which indicates that the labeled DNA extended by RT is also associated with RNA, which clearly 
demonstrates the presence of msDNA. 



20 
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Low -molecular weight bands in the gel indicate the presence of small ^^P-labeled 
DNA after polyacrylamide gel electrophoresis and autoradiography. The labeled DNA is indicative 
of the presence of msDNA. 




EXAMPLE 2 



By the method described above in Example 1, (a) Proteus mirabilis 1174b is found to 
synthesize msDNA by the retrons containing the RT; (b) Klebsiella pneumoniae 912b is found to 
synthesize msDNA by RT; (c) Salmonella sp. strain SARB-3 is found to synthesize msDNA by the 
retrons containing the by the retrons containing the RT; (d) Nannocvstis exedens Nael is found to 
synthesize msDNA by RT; (e) Bradvrhizobium spp. 3447, 3516 and 3004 are also found to synthesize 
^ msDNA by the retrons containing the RT. 

Q The following method, exemplified for E, coU, for the isolation and purification of 



n 



p bacterial RT is applicable to bacteria which are screened as positive for the presence of msDNA by 

hi 

the RT extension in vitro method. 



pa 



EXAMPLE 3 



"^5 Isolation and Purification of Bacterial Reverse Transcriptase. 

The following is a description of a convenient method for isolating and purifying a 

bacterial RT. 

From 10 liters of a stationary phase culture of E. coli strain C2110 harboring plasmid 
pCMEPSb, cells are harvested, washed in 50 mM Tris (pH 8.0), and resuspended in lysozyme buffer 
20 (50 mM Tris (pH 7.5), 10% sucrose, 0.3 M NaCl, 1 mM EDTA, 1 mM phenylmethylsulfonyl fluoride). 
Fresh lysozyme is added to a final concentration of 2 mg/ral. The suspension is incubated on ice for 
lAwoFFicEs 15 minutes followed by a quick freeze at -70^C, then thawed on ice. Lysis is enhanced by the 

WEISEK ft ASSOCUTES 
SUITE 500 

addition of 2 volumes of buffer M (50 mM Tris (pH 7.0), 1 mM dithiothreitol, 0.2% Nonidet P-40, 
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10% glycerol, and 25 mM NaCl) followed by incubation on ice, then a quick freeze-thaw. A cleared 
lysate is obtained by centrifugation at 38,000 rpm in a 50Ti rotor for 30 minutes. The cleared lysate 
is fractionated by ammonium sulfate precipitation (0-50%, 50-70% and 70-90%), followed by dialysis 
overnight (4^C) for each fraction against buffer M. Ammonium sulfate fractions, 50-70% and 70- 
90%, show RT activity and are pooled, then applied to aDEAE-column (2.5 x 50 cm; DE52 Whatman) 
equilibrated with buffer M. The DE52 column is washed, and RT activity is eluted from the column 
at a range of 300 to 350 mM NaCl. The DE52 fractions showing RT activity are pooled, concentrated 
by membrane ultrafiltration (Amicon) and then loaded onto a Sephacryl S-300 column (Pharmacia 
LKB Biotechnology Inc., 1.5 x 75 cm) equilibrated with buffer M. The column is developed with the 
same buffer. Again, fractions from the S-300 column having RT activity are pooled and 
concentrated, and 0.7 ml is loaded onto a 16-30% glycerol density gradient. The glycerol gradients 
are set up and run as described previously (Viswanathan et al., 1989). The purified Ec67.RT 
(fractions 7, 8 and 9) is stored as separate glycerol fractions at -20^C. 

When this protocol is applied to the msDNA bacterial synthesizing strains, the 
respective RTs are isolated and identified as shown above. 

Another convenient method for isolating and purifying reverse transcriptase is 
published in Lampson B.C., S. Inouye and M. Inouye, "msDNA of Bacteria", Progress in Nucleic Acid 
Research and Molecular Biologv . Vol. 40, pages 1 et seg. 

The invention has been described in detail with particular reference to the above 
embodiments. It will be understood, however, that variations and modifications can be affected 
within the spirit and scope of the invention. 
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